10 research outputs found
Deep Neural Ensemble for Retinal Vessel Segmentation in Fundus Images towards Achieving Label-free Angiography
Automated segmentation of retinal blood vessels in label-free fundus images
entails a pivotal role in computed aided diagnosis of ophthalmic pathologies,
viz., diabetic retinopathy, hypertensive disorders and cardiovascular diseases.
The challenge remains active in medical image analysis research due to varied
distribution of blood vessels, which manifest variations in their dimensions of
physical appearance against a noisy background.
In this paper we formulate the segmentation challenge as a classification
task. Specifically, we employ unsupervised hierarchical feature learning using
ensemble of two level of sparsely trained denoised stacked autoencoder. First
level training with bootstrap samples ensures decoupling and second level
ensemble formed by different network architectures ensures architectural
revision. We show that ensemble training of auto-encoders fosters diversity in
learning dictionary of visual kernels for vessel segmentation. SoftMax
classifier is used for fine tuning each member auto-encoder and multiple
strategies are explored for 2-level fusion of ensemble members. On DRIVE
dataset, we achieve maximum average accuracy of 95.33\% with an impressively
low standard deviation of 0.003 and Kappa agreement coefficient of 0.708 .
Comparison with other major algorithms substantiates the high efficacy of our
model.Comment: Accepted as a conference paper at IEEE EMBC, 201
Lightweight Modules for Efficient Deep Learning based Image Restoration
Low level image restoration is an integral component of modern artificial
intelligence (AI) driven camera pipelines. Most of these frameworks are based
on deep neural networks which present a massive computational overhead on
resource constrained platform like a mobile phone. In this paper, we propose
several lightweight low-level modules which can be used to create a
computationally low cost variant of a given baseline model. Recent works for
efficient neural networks design have mainly focused on classification.
However, low-level image processing falls under the image-to-image' translation
genre which requires some additional computational modules not present in
classification. This paper seeks to bridge this gap by designing generic
efficient modules which can replace essential components used in contemporary
deep learning based image restoration networks. We also present and analyse our
results highlighting the drawbacks of applying depthwise separable
convolutional kernel (a popular method for efficient classification network)
for sub-pixel convolution based upsampling (a popular upsampling strategy for
low-level vision applications). This shows that concepts from domain of
classification cannot always be seamlessly integrated into image-to-image
translation tasks. We extensively validate our findings on three popular tasks
of image inpainting, denoising and super-resolution. Our results show that
proposed networks consistently output visually similar reconstructions compared
to full capacity baselines with significant reduction of parameters, memory
footprint and execution speeds on contemporary mobile devices.Comment: Accepted at: IEEE Transactions on Circuits and Systems for Video
Technology (Early Access Print) | |Codes Available at:
https://github.com/avisekiit/TCSVT-LightWeight-CNNs | Supplementary Document
at:
https://drive.google.com/file/d/1BQhkh33Sen-d0qOrjq5h8ahw2VCUIVLg/view?usp=sharin
Directed Diffusion: Direct Control of Object Placement through Attention Guidance
Text-guided diffusion models such as DALLE-2, Imagen, and Stable Diffusion
are able to generate an effectively endless variety of images given only a
short text prompt describing the desired image content. In many cases the
images are of very high quality. However, these models often struggle to
compose scenes containing several key objects such as characters in specified
positional relationships. The missing capability to "direct" the placement of
characters and objects both within and across images is crucial in
storytelling, as recognized in the literature on film and animation theory. In
this work, we take a particularly straightforward approach to providing the
needed direction. Drawing on the observation that the cross-attention maps for
prompt words reflect the spatial layout of objects denoted by those words, we
introduce an optimization objective that produces ``activation'' at desired
positions in these cross-attention maps. The resulting approach is a step
toward generalizing the applicability of text-guided diffusion models beyond
single images to collections of related images, as in storybooks. To the best
of our knowledge, our Directed Diffusion method is the first diffusion
technique that provides positional control over multiple objects, while making
use of an existing pre-trained model and maintaining a coherent blend between
the positioned objects and the background. Moreover, it requires only a few
lines to implement.Comment: Our project page:
https://hohonu-vicml.github.io/DirectedDiffusion.Pag
Text Cutoff Detection for Document Images
This disclosure describes techniques for detection of text cutoff in captured images of documents that include text. Optical character recognition (OCR) is applied to an input image. A bounding box for each text character (OCR symbol) is determined, defined by x and y coordinates of its four corners. A feature vector is determined and utilized to represent the spatial location of OCR symbols extracted from the image. The feature vector is constructed based on OCR symbol coordinates and is provided to a trained classifier to determine a class label for the input document, indicating whether the document includes text cutoff. Optionally, the area of an image that includes text is automatically determined and utilized to limit the area of the image utilized for downstream document processing